About Data Analysis Report

This RMarkdown file contains the report of the data analysis done for the project on forecasting daily bike rental demand using time series models in R. It contains analysis such as data exploration, summary statistics and building the time series models. The final report was completed on Thu Feb 26 18:04:17 2026.

Data Description:

This dataset contains the daily count of rental bike transactions between years 2011 and 2012 in Capital bikeshare system with the corresponding weather and seasonal information.

Data Source: https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

Relevant Paper:

Fanaee-T, Hadi, and Gama, Joao. Event labeling combining ensemble detectors and background knowledge, Progress in Artificial Intelligence (2013): pp. 1-15, Springer Berlin Heidelberg

Task One: Load and explore the data

Load data and install packages

## Import required packages
if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
pacman::p_load(tidyverse, timetk, lubridate, forecast, tseries, ggthemes)
install.packages('psych')
## Installing package into '/usr/local/lib/R/site-library'
## (as 'lib' is unspecified)
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
data("bike_sharing_daily")
bike_data <-bike_sharing_daily
view(bike_data)

Describe and explore the data

describe(bike_data)
## Warning in FUN(newX[, i], ...): no non-missing arguments to min; returning Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning -Inf
##            vars   n    mean      sd  median trimmed     mad   min     max
## instant       1 731  366.00  211.17  366.00  366.00  271.32  1.00  731.00
## dteday        2 731     NaN      NA      NA     NaN      NA   Inf    -Inf
## season        3 731    2.50    1.11    3.00    2.50    1.48  1.00    4.00
## yr            4 731    0.50    0.50    1.00    0.50    0.00  0.00    1.00
## mnth          5 731    6.52    3.45    7.00    6.52    4.45  1.00   12.00
## holiday       6 731    0.03    0.17    0.00    0.00    0.00  0.00    1.00
## weekday       7 731    3.00    2.00    3.00    3.00    2.97  0.00    6.00
## workingday    8 731    0.68    0.47    1.00    0.73    0.00  0.00    1.00
## weathersit    9 731    1.40    0.54    1.00    1.33    0.00  1.00    3.00
## temp         10 731    0.50    0.18    0.50    0.50    0.23  0.06    0.86
## atemp        11 731    0.47    0.16    0.49    0.48    0.20  0.08    0.84
## hum          12 731    0.63    0.14    0.63    0.63    0.16  0.00    0.97
## windspeed    13 731    0.19    0.08    0.18    0.19    0.07  0.02    0.51
## casual       14 731  848.18  686.62  713.00  744.95  587.11  2.00 3410.00
## registered   15 731 3656.17 1560.26 3662.00 3641.72 1712.40 20.00 6946.00
## cnt          16 731 4504.35 1937.21 4548.00 4517.19 2086.02 22.00 8714.00
##              range  skew kurtosis    se
## instant     730.00  0.00    -1.20  7.81
## dteday        -Inf    NA       NA    NA
## season        3.00  0.00    -1.35  0.04
## yr            1.00  0.00    -2.00  0.02
## mnth         11.00 -0.01    -1.21  0.13
## holiday       1.00  5.63    29.75  0.01
## weekday       6.00  0.00    -1.26  0.07
## workingday    1.00 -0.79    -1.38  0.02
## weathersit    2.00  0.95    -0.15  0.02
## temp          0.80 -0.05    -1.12  0.01
## atemp         0.76 -0.13    -0.99  0.01
## hum           0.97 -0.07    -0.08  0.01
## windspeed     0.49  0.67     0.39  0.00
## casual     3408.00  1.26     1.29 25.40
## registered 6926.00  0.04    -0.72 57.71
## cnt        8692.00 -0.05    -0.82 71.65
summary(bike_data)
##     instant          dteday               season            yr        
##  Min.   :  1.0   Min.   :2011-01-01   Min.   :1.000   Min.   :0.0000  
##  1st Qu.:183.5   1st Qu.:2011-07-02   1st Qu.:2.000   1st Qu.:0.0000  
##  Median :366.0   Median :2012-01-01   Median :3.000   Median :1.0000  
##  Mean   :366.0   Mean   :2012-01-01   Mean   :2.497   Mean   :0.5007  
##  3rd Qu.:548.5   3rd Qu.:2012-07-01   3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :731.0   Max.   :2012-12-31   Max.   :4.000   Max.   :1.0000  
##       mnth          holiday           weekday        workingday   
##  Min.   : 1.00   Min.   :0.00000   Min.   :0.000   Min.   :0.000  
##  1st Qu.: 4.00   1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.000  
##  Median : 7.00   Median :0.00000   Median :3.000   Median :1.000  
##  Mean   : 6.52   Mean   :0.02873   Mean   :2.997   Mean   :0.684  
##  3rd Qu.:10.00   3rd Qu.:0.00000   3rd Qu.:5.000   3rd Qu.:1.000  
##  Max.   :12.00   Max.   :1.00000   Max.   :6.000   Max.   :1.000  
##    weathersit         temp             atemp              hum        
##  Min.   :1.000   Min.   :0.05913   Min.   :0.07907   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:0.33708   1st Qu.:0.33784   1st Qu.:0.5200  
##  Median :1.000   Median :0.49833   Median :0.48673   Median :0.6267  
##  Mean   :1.395   Mean   :0.49538   Mean   :0.47435   Mean   :0.6279  
##  3rd Qu.:2.000   3rd Qu.:0.65542   3rd Qu.:0.60860   3rd Qu.:0.7302  
##  Max.   :3.000   Max.   :0.86167   Max.   :0.84090   Max.   :0.9725  
##    windspeed           casual         registered        cnt      
##  Min.   :0.02239   Min.   :   2.0   Min.   :  20   Min.   :  22  
##  1st Qu.:0.13495   1st Qu.: 315.5   1st Qu.:2497   1st Qu.:3152  
##  Median :0.18097   Median : 713.0   Median :3662   Median :4548  
##  Mean   :0.19049   Mean   : 848.2   Mean   :3656   Mean   :4504  
##  3rd Qu.:0.23321   3rd Qu.:1096.0   3rd Qu.:4776   3rd Qu.:5956  
##  Max.   :0.50746   Max.   :3410.0   Max.   :6946   Max.   :8714
#convert date column to date type
bike_data$dteday <- as.Date(bike_data$dteday)

boxplot(bike_data)

ggplot(bike_data, aes(x = dteday, y = cnt)) +
  geom_line() +
  labs(title = "Daily Bike Rentals", x = "Date", y = "Count")

#mean and median temps over seasons 
bike_data %>%
  group_by(season) %>%
  summarize(mean_temp = mean(temp), median_temp = median(temp))
## # A tibble: 4 × 3
##   season mean_temp median_temp
##    <dbl>     <dbl>       <dbl>
## 1      1     0.298       0.286
## 2      2     0.544       0.562
## 3      3     0.706       0.715
## 4      4     0.423       0.409
# Mean temperature, humidity, wind speed, and total rentals per month
bike_data %>%
  
  group_by(mnth) %>%
  summarize(mean_temp = mean(temp),
            mean_humidity = mean(hum),
            mean_windspeed = mean(windspeed),
            total_rentals = sum(cnt))
## # A tibble: 12 × 5
##     mnth mean_temp mean_humidity mean_windspeed total_rentals
##    <dbl>     <dbl>         <dbl>          <dbl>         <dbl>
##  1     1     0.236         0.586          0.206        134933
##  2     2     0.299         0.567          0.216        151352
##  3     3     0.391         0.588          0.223        228920
##  4     4     0.470         0.588          0.234        269094
##  5     5     0.595         0.689          0.183        331686
##  6     6     0.684         0.576          0.185        346342
##  7     7     0.755         0.598          0.166        344948
##  8     8     0.709         0.638          0.173        351194
##  9     9     0.616         0.715          0.166        345991
## 10    10     0.485         0.694          0.175        322352
## 11    11     0.369         0.625          0.184        254831
## 12    12     0.324         0.666          0.177        211036
# Temperature association with bike registered/casual rentals
ggplot(bike_data, aes(x = temp)) +
  geom_point(aes(y = registered, color = "Registered")) +
  geom_point(aes(y = casual, color = "Casual")) +
  labs(title = "Temperature vs. Bike Rentals", x = "Normalized Temperature", y = "Count") +
  scale_color_manual(values = c("Registered" = "blue", "Casual" = "red"))

Task Two: Create interactive time series plots

## Read about the timetk package
# ?timetk

bike_data %>%
  plot_time_series(.date_var = dteday, .value = cnt, .interactive = TRUE, .plotly_slider = TRUE, .color_var = year(dteday))

Task Three: Smooth time series data

bike_data %>%
    plot_time_series(.date_var = dteday, .value = cnt, .smooth = TRUE)
# Load additional required packages
pacman::p_load(zoo, TTR)

# Clean the time series data
bike_data_clean <- bike_data %>%
  mutate(cnt_clean = tsclean(ts(cnt, frequency = 365)))

# Plot cleaned data
ggplot(bike_data_clean, aes(x = dteday)) +
  geom_line(aes(y = cnt, color = "Original")) +
  geom_line(aes(y = cnt_clean, color = "Cleaned")) +
  labs(title = "Cleaned Daily Bike Rentals", x = "Date", y = "Count") +
  scale_color_manual(values = c("Original" = "black", "Cleaned" = "red"))

# Apply Simple Moving Average (SMA)
bike_data_clean <- bike_data_clean %>%
  mutate(cnt_sma = SMA(cnt_clean, n = 10))

# Plot smoothed data
ggplot(bike_data_clean, aes(x = dteday)) +
  geom_line(aes(y = cnt_clean, color = "Cleaned")) +
  geom_line(aes(y = cnt_sma, color = "Smoothed (SMA)")) +
  labs(title = "Smoothed Daily Bike Rentals", x = "Date", y = "Count") +
  scale_color_manual(values = c("Cleaned" = "blue", "Smoothed (SMA)" = "red"))
## Don't know how to automatically pick scale for object of type <ts>. Defaulting
## to continuous.
## Warning: Removed 9 rows containing missing values (`geom_line()`).

# Apply Simple Exponential Smoothing
bike_ts<- ts(bike_data_clean$cnt_clean, frequency = 365)
fit_ets <- HoltWinters(bike_ts)

# Plot Exponential Smoothing
plot(fit_ets)

Task Four: Decompose and assess the stationarity of time series data

bike_decomp <- stl(bike_ts, s.window = "periodic")
plot(bike_decomp)

Task Five: Fit and forecast time series data using ARIMA models

#fit ARIMA model
fit <- auto.arima(bike_ts, seasonal = TRUE)
summary(fit)
## Series: bike_ts 
## ARIMA(1,0,3)(0,1,0)[365] with drift 
## 
## Coefficients:
##          ar1      ma1      ma2      ma3   drift
##       0.9683  -0.5912  -0.1279  -0.0937  5.7116
## s.e.  0.0224   0.0571   0.0617   0.0576  0.8318
## 
## sigma^2 = 986021:  log likelihood = -3042.81
## AIC=6097.63   AICc=6097.86   BIC=6121.05
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE
## Training set 5.853004 697.8113 385.8648 -2.699883 9.189324 0.1694626
##                      ACF1
## Training set -0.003587809
#forecast data
bike_forecast <- forecast(fit,h=60)

#plot 
autoplot(bike_forecast) +
  labs(title = "Bike Rental Forecast for Next 60 Days", x = "Date", y = "Count")

Task Six: Findings and Conclusions

Seasonal Patterns: Bike rentals exhibit clear seasonal patterns, with a higher number of rentals during warmer months and lower number of rentals in colder months. Suggesting that weather plays a significant role in bike rental.

Model Accuracy: The ARIMA model captured the overall trend and seasonality well, showing that it may be a useful tool for predicting future bike rental demand.

Next steps would be to look further into the correlation of different variables, such as does windspeed effect rentals or workday vs weekend vs holiday. These can give further insight into what is the main factors for renting a bike.